The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
نویسنده
چکیده
We prove that the classic policy-iteration method (Howard 1960) and the original simplex method with the most-negative-reduced-cost pivoting rule (Dantzig 1947) are strongly polynomial-time algorithms for solving the Markov decision problem (MDP) with a fixed discount rate. Furthermore, the computational complexity of the policy-iteration and simplex methods is superior to that of the only known strongly polynomial-time interiorpoint algorithm (Ye 2005) for solving this problem. The result is surprising since the simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming (LP) problem (Klee and Minty 1972), the simplex method with the smallest-index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates (Melekopoglou and Condon 1994), and the policy-iteration method was recently shown to be exponential for solving undiscounted MDPs under the average cost criterion. We also extend the result to solving MDPs with transient substochastic transition matrices whose spectral radii uniformly minorize one.
منابع مشابه
The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
In this short paper we prove that the classic simplex method with the mostnegative-reduced-cost pivoting rule (Dantzig 1947) for solving the Markov decision problem (MDP) with a fixed discount rate is a strongly polynomial-time algorithm. The result seems surprising since this very pivoting rule was shown to be exponential for solving a general linear programming (LP) problem, and the simplex (...
متن کاملThe simplex method is strongly polynomial for deterministic Markov decision processes
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the discount factor. For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(nm log n) iterations if the discount factor is uniform and O(nm log n) iterations if e...
متن کاملPolicy Iteration is well suited to optimize PageRank
The question of knowing whether the policy Iteration algorithm (PI) for solving Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention in the last 50 years. Recently, Fearnley proposed an example on which PI needs an exponential number of iterations to converge. Though, it has been observed that Fearnley’s example leaves open the possib...
متن کاملMarkov Chain Anticipation for the Online Traveling Salesman Problem by Simulated Annealing Algorithm
The arc costs are assumed to be online parameters of the network and decisions should be made while the costs of arcs are not known. The policies determine the permitted nodes and arcs to traverse and they are generally defined according to the departure nodes of the current policy nodes. In on-line created tours arc costs are not available for decision makers. The on-line traversed nodes are f...
متن کاملA POLYNOMIAL TIME BRANCH AND BOUND ALGORITHM FOR THE SINGLE ITEM ECONOMIC LOT SIZING PROBLEM WITH ALL UNITS DISCOUNT AND RESALE
The purpose of this paper is to present a polynomial time algorithm which determines the lot sizes for purchase component in Material Requirement Planning (MRP) environments with deterministic time-phased demand with zero lead time. In this model, backlog is not permitted, the unit purchasing price is based on the all-units discount system and resale of the excess units is possible at the order...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Math. Oper. Res.
دوره 36 شماره
صفحات -
تاریخ انتشار 2011